Skip to content

fix(argocd): extract revision from multi-source application revisions[]#8810

Merged
rbstp merged 2 commits into
apache:mainfrom
vemulaanvesh:fix/argocd-multi-source-revision
Apr 3, 2026
Merged

fix(argocd): extract revision from multi-source application revisions[]#8810
rbstp merged 2 commits into
apache:mainfrom
vemulaanvesh:fix/argocd-multi-source-revision

Conversation

@vemulaanvesh
Copy link
Copy Markdown
Contributor

@vemulaanvesh vemulaanvesh commented Mar 25, 2026

Summary

ArgoCD multi-source applications (those using spec.sources[] instead of spec.source) store one revision per source in a revisions[] array on both history entries and operationState. The existing extractor only read the single-source revision field, which is always empty for multi-source apps.

Impact: cicd_deployment_commits was never populated for multi-source apps, so ArgoCD deployments were invisible to DORA Deployment Frequency and Lead Time for Changes metrics.

Relates to: #5207

Root cause

// Multi-source history entry from ArgoCD API
{
  "id": 41,
  "deployedAt": "2026-03-19T18:07:59Z",
  "revision": "",          ← always empty for multi-source
  "revisions": [
    "2.6.2",               ← Helm chart version (GCS/OCI)
    "5dd95b4efd7e9b668c361bbddb8d7f1e56c32ac1"  ← git commit SHA (GitHub)
  ],
  "sources": [
    {"repoURL": "gs://charts-example/infra/stable", "chart": "generic-service"},
    {"repoURL": "https://github.com/example/my-repo"}
  ]
}

The extractor set syncOp.Revision = apiOp.Revision (empty string), causing convertSyncOperations to skip the cicd_deployment_commits write (if syncOp.Revision != "").

Changes

sync_operation_extractor.go

  • Add Revisions []string and Sources []ArgocdApiSyncSource to ArgocdApiSyncOperation struct.
  • Add Revisions []string to the SyncResult nested struct for operationState entries.
  • When revision is empty, call resolveMultiSourceRevision() before any skip logic.
  • resolveMultiSourceRevision() — picks the git commit SHA from revisions[]:
    • Pass 1: prefers the revision whose corresponding source URL belongs to a known git hosting service (GitHub, GitLab, Bitbucket, Azure DevOps, Gitea, Forgejo).
    • Pass 2: falls back to any 40-hex string regardless of source type (covers self-hosted instances).

application_extractor.go

  • Add Sources []ArgocdApiApplicationSource to ArgocdApiApplication.
  • When spec.source.repoURL is empty, resolve the primary source from spec.sources[] using the same git-host heuristic so ArgocdApplication.RepoURL is set to the browsable git repo URL rather than a Helm registry address.

Tests added (sync_operation_extractor_test.go)

Test Scenario
TestResolveMultiSourceRevision_GitHubSourceWins GCS chart + GitHub values repo → picks GitHub SHA
TestResolveMultiSourceRevision_GitLabSourceWins OCI chart + GitLab → picks GitLab SHA
TestResolveMultiSourceRevision_FallbackToAnySHA Gitea host in list → picked by heuristic
TestResolveMultiSourceRevision_EmptyRevisions nil / empty input → returns ""
TestResolveMultiSourceRevision_AllSemver all semver tags, no git SHA → returns ""
TestResolveMultiSourceRevision_SingleGitSHA single-element revisions slice
TestIsCommitSHA 40-hex validation edge cases
TestIsGitHostedURL git host detection, chart registries return false

Backward compatibility

  • Single-source apps are unaffected: revision is non-empty so resolveMultiSourceRevision is not called.
  • The new struct fields are purely additive JSON tags; existing serialised raw data without these fields deserialises to zero values without error.

@dosubot dosubot Bot added size:L This PR changes 100-499 lines, ignoring generated files. pr-type/bug-fix This PR fixes a bug labels Mar 25, 2026
Comment thread .github/workflows/build-custom-image.yml Outdated
Comment thread backend/Dockerfile
@vemulaanvesh vemulaanvesh force-pushed the fix/argocd-multi-source-revision branch from cfc0c3d to 80f81ac Compare March 27, 2026 15:22
Comment thread backend/plugins/argocd/tasks/sync_operation_extractor_test.go Outdated
@rbstp
Copy link
Copy Markdown
Contributor

rbstp commented Mar 28, 2026

lgtm

rbstp
rbstp previously approved these changes Mar 28, 2026
@vemulaanvesh
Copy link
Copy Markdown
Contributor Author

@rbstp can merge this ?

@rbstp
Copy link
Copy Markdown
Contributor

rbstp commented Mar 28, 2026

@rbstp can merge this ?

I am not sure if a maintainer wants to have a look first @klesh

@vemulaanvesh
Copy link
Copy Markdown
Contributor Author

Hi @klesh can i get your eyes here to merge ?

Vemula Anvesh added 2 commits April 1, 2026 11:48
ArgoCD multi-source applications (spec.sources[]) store one revision per
source in a revisions[] array on history entries and operationState.
The existing extractor only read the single-source revision field, which
is always empty for multi-source apps, so cicd_deployment_commits was
never populated and ArgoCD deployments were invisible to DORA metrics.

Changes:
- sync_operation_extractor.go
  - Add Revisions []string and Sources []ArgocdApiSyncSource fields to
    ArgocdApiSyncOperation to deserialise multi-source payloads.
  - Add Revisions []string to SyncResult for operationState entries.
  - Call resolveMultiSourceRevision() when revision is empty, both for
    history entries and for operationState.
  - Add resolveMultiSourceRevision(): prefers the revision whose source URL
    belongs to a known git hosting service (GitHub, GitLab, Bitbucket, Azure
    DevOps, Gitea, Forgejo); falls back to any 40-hex commit SHA.
  - Add isGitHostedURL() and isCommitSHA() helpers.

- application_extractor.go
  - Add Sources []ArgocdApiApplicationSource to ArgocdApiApplication so
    multi-source app metadata is deserialised.
  - Resolve the primary git-hosted source from spec.sources[] when
    spec.source.repoURL is empty, ensuring ArgocdApplication.RepoURL is
    a browsable repository URL rather than a Helm chart registry address.

Fixes: multi-source ArgoCD apps produce 0 rows in cicd_deployment_commits
Relates-to: apache#5207
…ployment_commits

Previously cicd_deployment_commits.repo_url was populated by reading
application.RepoURL in convertSyncOperations. For multi-source apps
_tool_argocd_applications.repo_url is empty when extractApplications
is skipped (DevLake's collector state cache omits it when no new
application raw data has been collected), causing the repo_url field
to fall back to the deployment-name placeholder and breaking DevLake's
commits_diff to PR linkage for DORA Lead Time metrics.

Fix: resolve the git repo URL during extractSyncOperations, which
always runs regardless of collector state. The resolved URL is stored
in a new ArgocdSyncOperation.RepoURL field and used as the primary
source in convertSyncOperations, with application.RepoURL as fallback.

Changes:
- models/sync_operation.go: add RepoURL varchar(500) field
- models/migrationscripts/: add migration 20260331 to ADD COLUMN
- tasks/sync_operation_extractor.go:
  - add resolveGitRepoURL(singleSourceURL, sources): prefers known
    git-hosting URLs from sources[], falls back to first non-chart URL
  - populate syncOp.RepoURL after revision resolution
- tasks/sync_operation_convertor.go: use syncOp.RepoURL first, then
  application.RepoURL, then deployment name as last resort
- tasks/sync_operation_extractor_test.go: 6 new tests for
  resolveGitRepoURL covering single-source, multi-source, fallback,
  all-chart, and empty inputs
@klesh
Copy link
Copy Markdown
Contributor

klesh commented Apr 1, 2026

@rbstp, feel free to merge once all checks pass—I'm not close enough to the details to add anything else.

@vemulaanvesh vemulaanvesh force-pushed the fix/argocd-multi-source-revision branch from 032e1aa to 134c91f Compare April 1, 2026 11:53
@vemulaanvesh vemulaanvesh requested a review from rbstp April 1, 2026 11:57
@rbstp rbstp merged commit 94f7bca into apache:main Apr 3, 2026
10 checks passed
la-tamas pushed a commit to archfz/incubator-devlake that referenced this pull request Apr 9, 2026
ewega added a commit that referenced this pull request Jun 5, 2026
…ency (#8811)

* fix: Add backfill time window to ensure data consistency

chore: Add backfill time window to ensure data consistency
(cherry picked from commit 52a9555b782860f9c8ba9db409bd56e0c8f58272)

* fix(q_dev): prevent data duplication in user_report and user_data tables (#8737)

* fix(q_dev): prevent data duplication in user_report and user_data tables

Replace auto-increment ID with composite primary keys so that
CreateOrUpdate can properly deduplicate rows on re-extraction.

- user_report PK: (connection_id, scope_id, user_id, date, client_type)
- user_data PK: (connection_id, scope_id, user_id, date)
- Switch db.Create() to db.CreateOrUpdate() in s3_data_extractor
- Migration drops old tables, rebuilds with new PKs, resets s3_file_meta
  processed flag to trigger re-extraction

* fix(q_dev): gofmt archived user_data_v2 model

* feat(github): Extend exclusion of file extensions to github plugin (#8719)

* feat(github): extend PR size exclusion for specified file extension to github plugin

* fix: register migration script

* fix: move PR size to 'Additional settings' and change so the comma doesn't get removed while typing

* fix: linting

* fix(doc): update expired Slack invite links in README (#8739)

The Slack invite links in README.md were expired and returning
"This link is no longer active." Updated both occurrences (badge
and community section) to match the current link on the official
DevLake website.

Closes #8738

Co-authored-by: Spiff Azeta <spiffazeta@gmail.com>

* docs: add gh-devlake CLI to Getting Started installation options (#8733)

Adds gh-devlake as a third installation method alongside Docker Compose
and Helm. gh-devlake is a GitHub CLI extension that automates DevLake
deployment, configuration, and monitoring from the terminal.

Closes #8732

* fix(gitlab): add missing repos scope in project_mapping (#8743)

GitLab's makeScopeV200 did not create a repos scope when
scopeConfig.Entities was empty or only contained CROSS. This
caused project_mapping to have no table='repos' row, breaking
downstream DORA metrics, PR-issue linking, and all PR dashboard
panels that join on project_mapping.

The fix aligns GitLab with the GitHub plugin by:
1. Defaulting empty entities to plugin.DOMAIN_TYPES
2. Adding DOMAIN_TYPE_CROSS to the repo scope condition

Closes #8742

Co-authored-by: Spiff Azeta <spiffazeta@gmail.com>

* fix(grafana): update dashboard descriptions to list all supported data sources (#8741)

Several dashboard introduction panels hardcoded "GitHub and Jira" as
required data sources, even though the underlying queries use generic
domain layer tables that work with any supported Git tool or issue
tracker. Updated to list all supported sources following the pattern
already used by DORA and WorkLogs dashboards.

Closes #8740

Co-authored-by: Spiff Azeta <spiffazeta@gmail.com>

* fix: modify cicd_deployments name from varchar to text (#8724)

* fix: modify cicd_deployments name from varchar to text

* fix: update the year

* fix(q_dev): replace MariaDB-specific IF NOT EXISTS syntax with DAL methods for MySQL 8.x compatibility (#8745)

* fix(azuredevops): default empty entities and add CROSS to repo scope in makeScopeV200 (#8751)

When scopeConfig.Entities is empty (common when no entities are
explicitly selected in the UI), makeScopeV200 produced zero scopes,
leaving project_mapping with no rows. Additionally, the repo scope
condition did not check for DOMAIN_TYPE_CROSS, so selecting only
CROSS would not create a repo scope, breaking DORA metrics.

This adds the same fixes applied to GitLab in #8743.

Closes #8749

* fix(bitbucket): default empty entities to all domain types in makeScopesV200 (#8750)

When scopeConfig.Entities is empty (common when no entities are
explicitly selected in the UI), makeScopesV200 produced zero scopes,
leaving project_mapping with no repo rows. This adds the same
empty-entities default applied to GitLab in #8743.

Closes #8748

* feat(circleci): add server version requirement and endpoint help text (#8757)

Update CircleCI connection form to indicate Server v4.x+ requirement
and provide guidance for server endpoint configuration.

Signed-off-by: Joshua Smith <jbsmith7741@gmail.com>

* feat(asana): add Asana plugin for project and task collection (#8758)

Add a new Asana plugin that integrates with Asana's REST API to collect
projects, sections, tasks, subtasks, stories (comments), tags, and users,
mapping them to DevLake's ticket/board domain model.

Backend:
- Plugin implementation with all required interfaces (PluginMeta,
  PluginTask, PluginModel, PluginMigration, PluginSource, PluginApi,
  DataSourcePluginBlueprintV200)
- Collectors, extractors, and converters for projects, sections, tasks,
  subtasks, stories, tags, and users
- Remote API scope picker (Workspaces -> Teams/Portfolios -> Projects)
- Scope config with issue-type regex transformation rules
- Migration scripts for schema evolution
- E2E tests with CSV fixtures for project and task data flows

Config UI:
- Plugin registration with connection form (PAT auth, endpoint, proxy)
- Scope config transformation form for issue-type mapping
- Dashboard URL integration for onboarding flow

Grafana:
- Asana dashboard with task metrics and visualizations

Made-with: Cursor

* feat: GitHub App token refresh (#8746)

* feat(github): auto-refresh GitHub App installation tokens

Add transport-level token refresh for GitHub App (AppKey) connections.
GitHub App installation tokens expire after ~1 hour; this adds proactive
refresh (before expiry) and reactive refresh (on 401) using the existing
TokenProvider/RefreshRoundTripper infrastructure.

New files:
- app_installation_refresh.go: refresh logic + DB persistence
- refresh_api_client.go: minimal ApiClient for token refresh POST
- cmd/test_refresh/main.go: manual test script for real GitHub Apps

Modified:
- connection.go: export GetInstallationAccessToken, parse ExpiresAt
- token_provider.go: add refreshFn for pluggable refresh strategies
- round_tripper.go: document dual Authorization header interaction
- api_client.go: wire AppKey connections into refresh infrastructure
- Tests updated for new constructors and AppKey refresh flow

* feat(github): add diagnostic logging to GitHub App token refresh

Add structured logging at key decision points for token refresh:
- Token provider creation (connection ID, installation ID, expiry)
- Round tripper installation (connection ID, auth method)
- Proactive refresh trigger (near-expiry detection)
- Refresh start/success/failure (old/new token prefixes, expiry times)
- DB persistence success/failure
- Reactive 401 refresh and skip-due-to-concurrent-refresh

All logs route through the DevLake logger to pipeline log files.

* fix(github): prevent deadlock and fix token persistence in App token refresh

Deadlock fix: NewAppInstallationTokenProvider now captures client.Transport
(the base transport) before wrapping with RefreshRoundTripper. The refresh
function uses newRefreshApiClientWithTransport(baseTransport) to POST for
new installation tokens, bypassing the RefreshRoundTripper entirely.

Token persistence fix: PersistEncryptedTokenColumns() manually encrypts
tokens via plugin.Encrypt() then writes ciphertext via dal.UpdateColumns
with conn.TableName() (a string) as the first argument. Passing the table
name string makes GORM use Table() instead of Model(), preventing the
encdec serializer from corrupting the in-memory token value. The encryption
secret is threaded from taskCtx.GetConfig(ENCRYPTION_SECRET) through
CreateApiClient to TokenProvider to persist functions.

Also persists the initial App token at startup for DB consistency, and
adds TestProactiveRefreshNoDeadlock with a real RSA key to verify the
deadlock scenario is resolved.

* fix(grafana): update dashboard descriptions to list all supported data sources (#8741)

Several dashboard introduction panels hardcoded "GitHub and Jira" as
required data sources, even though the underlying queries use generic
domain layer tables that work with any supported Git tool or issue
tracker. Updated to list all supported sources following the pattern
already used by DORA and WorkLogs dashboards.

Closes #8740

Co-authored-by: Spiff Azeta <spiffazeta@gmail.com>

* fix: modify cicd_deployments name from varchar to text (#8724)

* fix: modify cicd_deployments name from varchar to text

* fix: update the year

* fix(q_dev): replace MariaDB-specific IF NOT EXISTS syntax with DAL methods for MySQL 8.x compatibility (#8745)

* fix(azuredevops): default empty entities and add CROSS to repo scope in makeScopeV200 (#8751)

When scopeConfig.Entities is empty (common when no entities are
explicitly selected in the UI), makeScopeV200 produced zero scopes,
leaving project_mapping with no rows. Additionally, the repo scope
condition did not check for DOMAIN_TYPE_CROSS, so selecting only
CROSS would not create a repo scope, breaking DORA metrics.

This adds the same fixes applied to GitLab in #8743.

Closes #8749

* fix(bitbucket): default empty entities to all domain types in makeScopesV200 (#8750)

When scopeConfig.Entities is empty (common when no entities are
explicitly selected in the UI), makeScopesV200 produced zero scopes,
leaving project_mapping with no repo rows. This adds the same
empty-entities default applied to GitLab in #8743.

Closes #8748

* fix(github): remove unused refresh client constructor and update tests

---------

Co-authored-by: Spiff Azeta <35563797+spiffaz@users.noreply.github.com>
Co-authored-by: Spiff Azeta <spiffazeta@gmail.com>
Co-authored-by: Dan Crews <crewsd@gmail.com>
Co-authored-by: Tomoya Kawaguchi <68677002+yamoyamoto@users.noreply.github.com>

* fix: cwe89 sql injection (#8762)

* feat(q-dev): add logging data ingestion and enrich Kiro dashboards (#8767)

* feat(q-dev): add logging data ingestion and enrich Kiro dashboards

Add support for ingesting S3 logging data (GenerateAssistantResponse and
GenerateCompletions events) into new database tables, and enrich all three
Kiro Grafana dashboards with additional metrics.

Changes:
- New models: QDevChatLog and QDevCompletionLog for logging event data
- New extractor: s3_logging_extractor.go parses JSON.gz logging files
- Updated S3 collector to also handle .json.gz files
- Added logging S3 prefixes (GenerateAssistantResponse, GenerateCompletions)
- New dashboard: "Kiro AI Activity Insights" with 10 panels including
  model usage distribution, active hours, conversation depth, feature
  adoption (Steering/Spec), file type usage, and prompt/response trends
- Enriched "Kiro Code Metrics Dashboard" with DocGeneration, TestGeneration,
  and Dev (Agentic) metric panels
- Fixed "Kiro Usage Dashboard" per-user table to sort by user_id
- Migration script for new tables

* fix(q-dev): use separate base path for logging S3 prefixes

Logging data lives under a different S3 prefix ("logging/") than user
report data ("user-report/"). Add LoggingBasePath option (defaults to
"logging") so logging prefixes are constructed correctly.

* fix(q-dev): auto-scan logging path without extra config

Kiro exports to two well-known S3 prefixes in the same bucket:
- user-report/AWSLogs/{accountId}/KiroLogs/ (CSV reports)
- logging/AWSLogs/{accountId}/KiroLogs/ (interaction logs)

When AccountId is set, automatically scan both paths. The "logging"
prefix is hardcoded since it's a standard Kiro export convention.
No additional configuration needed.

* fix(q-dev): update scope tooltip to mention logging data scanning

* fix(q-dev): fix scope ID routing and CSV/JSON file separation

Three fixes:
1. Use *scopeId (catch-all) route pattern instead of :scopeId so scope
   IDs containing "/" (e.g. "034362076319/2026") work in URL paths
2. CSV extractor now filters for .csv files only, preventing it from
   trying to parse .json.gz logging files as CSV
3. Frontend scope API calls now encodeURIComponent(scopeId) for safe
   URL encoding

* fix(q-dev): resolve *scopeId route conflict with dispatcher pattern

The catch-all *scopeId route conflicts with *scopeId/latest-sync-state.
Follow Jenkins/Bitbucket pattern: use a single *scopeId route with a
GetScopeDispatcher that checks for /latest-sync-state suffix and
dispatches accordingly. All scope handlers now TrimLeft "/" from scopeId.

* fix(q-dev): use URL-safe scope ID format (underscore separator)

Scope IDs like "034362076319/2026" break URL routing because "/" is a
path separator. Change ID format to "034362076319_2026" (underscore)
when AccountId is set. The Prefix field still uses "/" for S3 path
matching. Revert to standard :scopeId routes since IDs are now safe.

Note: existing scopes need to be recreated after this change.

* fix(q-dev): use NoPKModel instead of Model in archived logging models

archived.Model only has ID+timestamps, missing RawDataOrigin fields
(_raw_data_params etc.) that common.NoPKModel includes. This caused
"Unknown column '_raw_data_params'" errors at runtime.

* fix(q-dev): fix GROUP BY in per-user table to merge display_name variants

Remove display_name from GROUP BY so same user_id with different
display_name values gets merged. Use MAX(display_name) in SELECT.

* fix(q-dev): normalize logging user IDs to match CSV short UUID format

Logging data uses "d-{directoryId}.{UUID}" format while CSV user-report
uses plain "{UUID}". Strip the "d-xxx." prefix so the same user maps to
one user_id across both data sources.

* fix(q-dev): normalize user IDs in CSV extractors and sort table DESC

Apply normalizeUserId to both createUserReportData and
createUserDataWithDisplayName so user_report CSV data also strips
the "d-{directoryId}." prefix. Change per-user table sort to
ORDER BY user_id DESC.

* style(q-dev): fix gofmt formatting in chat_log models

* perf(q-dev): parallelize logging S3 downloads and batch DB writes

Optimize logging extractor performance:
- 10 goroutine workers for parallel S3 file downloads
- Batch 50 files per DB transaction instead of 1-per-file
- sync.Map cache for display name resolution (avoid repeated IAM calls)
- Parse records in memory during download, write all at once

This should improve throughput from ~1.5 files/sec to ~15+ files/sec
for typical logging file sizes.

* fix(q-dev): check tx.Rollback error return to satisfy errcheck lint

* feat(q-dev): add per-user model usage table and models column

Add "Per-User Model Usage" table (panel 11) showing each user's
request count and avg prompt/response length per model_id. Also add
"Models Used" column to the Per-User Activity table.

* fix(q-dev): remove per-user model usage table, keep models column only

* feat(q-dev): add Kiro Executive Dashboard with cross-source analytics

New dashboard "Kiro Executive Dashboard" with 12 panels covering:
- KPIs: WAU, credits efficiency, acceptance rate, steering adoption
- Trends: weekly active users, new vs returning users
- Adoption funnel: Chat→Inline→CodeFix→Review→DocGen→TestGen→Agentic→Steering→Spec
- Cost: credits pace vs projected monthly, idle power users
- Quality: acceptance rate trends, code review findings, test generation
- Efficiency: per-user productivity table with credits/line ratio

Correlates data across user_report (credits), user_data (code metrics),
and chat_log (interaction patterns) for holistic Kiro usage insights.

* fix(q-dev): fix pie charts to show per-row slices instead of single total

Set reduceOptions.values=true so Grafana treats each SQL result row as
a separate pie slice. Fixes Model Usage Distribution, File Type Usage,
Kiro Feature Adoption, and Active File Types pie charts.

* fix(q-dev): cast Hour to string for Active Hours bar chart x-axis

* fix(q-dev): fix pie chart single-slice and GROUP BY display_name issues

1. qdev_user_report Panel 4 (Subscription Tier Distribution): set
   reduceOptions.values=true to show per-tier slices
2. qdev_user_data Panel 6 (User Interactions): remove display_name
   from GROUP BY, use MAX(display_name) to merge same user

* fix(q-dev): prevent data inflation in user_report JOIN user_data

user_report has multiple rows per (user_id, date) due to client_type
(KIRO_IDE, KIRO_CLI), but user_data has only one row per (user_id, date).
A direct JOIN causes user_data metrics to be counted multiple times.

Fix: pre-aggregate user_report by (user_id, date) in a subquery before
joining, so the JOIN is always 1:1.

Affects: Credits Efficiency stat and User Productivity table.

* feat(qa): add is_invalid field to qa_test_case_executions (#8764)

* feat(qa): add is_invalid field to qa_test_case_executions

Add is_invalid boolean field to the domain layer qa_test_case_executions
table to allow QA teams to flag test executions as invalid due to
environmental issues, flaky tests, false positives, or false negatives.

Changes:
- Add IsInvalid field to QaTestCaseExecution domain model
- Create migration script (20260313_add_is_invalid_to_qa_test_case_executions)
- Register migration in migrationscripts/register.go
- Update customize service to set default value for is_invalid
- Update E2E test data to include new column

Resolves #8763

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(qa): handle missing is_invalid column in CSV import

Fix PostgreSQL compatibility issue when CSV files don't contain
the is_invalid column. The field now defaults to false instead
of an empty string.

Changes:
- Update qaTestCaseExecutionHandler to check for empty string values
- Add E2E test for backward compatibility with CSV files lacking is_invalid
- Add explicit IsInvalid initialization in Testmo plugin converter

Resolves #8763

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat(linker): link when branch names contain issue keys (#8777)

* feat(linker): branch names containing issue keys

* chore: add testing data

* Add codespell support with configuration and fixes (#8761)

* ci(codespell): add codespell config and GitHub Actions workflow

Add .codespellrc with skip patterns for generated files, camelCase/PascalCase
ignore-regex, and project-specific word list (convertor, crypted, te, thur).
Add GitHub Actions workflow to run codespell on push to main and PRs.

Co-Authored-By: Claude Code 2.1.63 / Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Yaroslav Halchenko <debian@onerussian.com>

* fix(codespell): fix ambiguous typos requiring context review

Manual fixes for typos that needed human review to avoid breaking code:
- Comment/string typos: occured->occurred, destory->destroy, writting->writing,
  retreive->retrieve, identifer->identifier, etc.
- Struct field comments and documentation corrections
- Migration script comment fixes (preserving Go identifiers like DataConvertor)

Co-Authored-By: Claude Code 2.1.63 / Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Yaroslav Halchenko <debian@onerussian.com>

* fix(codespell): fix non-ambiguous typos with codespell -w

Automated fix via `codespell -w` for clear-cut typos across backend, config-ui,
and grafana dashboards. Examples: sucess->success, occurence->occurrence,
exeucte->execute, asynchornous->asynchronous, Grafana panel typos, etc.

Co-Authored-By: Claude Code 2.1.63 / Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Yaroslav Halchenko <debian@onerussian.com>

---------

Signed-off-by: Yaroslav Halchenko <debian@onerussian.com>
Co-authored-by: Claude Code 2.1.63 / Claude Opus 4.6 <noreply@anthropic.com>

* feat(q-dev): enrich logging fields, separate dashboards, add E2E tests (#8786)

* feat(q-dev): enrich logging fields, separate dashboards by data source, add E2E tests

- Add new fields to chat_log: CodeReferenceCount, WebLinkCount, HasFollowupPrompts
  (from codeReferenceEvents, supplementaryWebLinksEvent, followupPrompts in JSON)
- Add new fields to completion_log: LeftContextLength, RightContextLength
  (from leftContext/rightContext in JSON)
- Update s3_logging_extractor to parse and populate new fields
- Add migration script 20260319_add_logging_fields
- Create qdev_feature_metrics dashboard for legacy by_user_analytic data
- Reorganize qdev_executive dashboard with Row dividers labeling data sources
  and cross-dashboard navigation links
- Enrich qdev_logging dashboard with new panels:
  Chat Trigger Type Distribution, Response Enrichment Breakdown,
  Completion Context Size Trends, Response Enrichment Trends
- Fix SQL compatibility with only_full_group_by mode in executive dashboard
  (Weekly Active Users Trend, New vs Returning Users)
- Fix Steering Adoption stat panel returning string instead of numeric value
- Add Playwright E2E test covering full pipeline flow and dashboard verification

* fix: add Apache license headers to e2e files, fix gofmt alignment

* fix: add SQL identifier validation to prevent SQL injection via table/column names (#8769)

Add ValidateTableName and ValidateColumnName functions in core/dal to ensure
table and column names used in dynamic SQL are safe identifiers. Applied to
scope_service_helper, scope_generic_helper, and customized_fields_extractor.

* feat(q-dev): add Kiro Credits + DORA Correlation dashboard (#8792)

Add a new Grafana dashboard that correlates Kiro AI usage (credits,
messages, active users) with DORA metrics at weekly aggregate level.

Panels include:
- Pearson's r correlation between weekly credits and PR cycle time
- High AI Usage vs Low AI Usage cycle time comparison
- Weekly credits vs deployment frequency trend
- Weekly credits vs change failure rate trend

Data is joined by week_start between _tool_q_dev_user_report and
project_pr_metrics / cicd_deployment_commits.

* feat(q-dev): add AI Cost-Efficiency dashboard (#8793)

Add a Grafana dashboard showing AI tool cost-efficiency metrics:
- Credits per merged PR (overall + weekly trend)
- Credits per production deployment (overall + weekly trend)
- Credits per issue resolved (overall + weekly trend)
- Weekly AI activity volume (credits, messages, conversations)

Joins _tool_q_dev_user_report with pull_requests,
cicd_deployment_commits, and issues by weekly aggregation.

* feat(q-dev): add Multi-AI Tool Comparison dashboard (Copilot vs Kiro) (#8794)

Add a Grafana dashboard comparing GitHub Copilot and Kiro side by side:
- Weekly active users comparison
- Code suggestions & acceptance events (per tool)
- LOC accepted comparison (combined time series)
- Acceptance rate comparison (bar gauge)

Template variables for Copilot connection/scope selection.
Data from _tool_copilot_enterprise_daily_metrics vs
_tool_q_dev_user_report and _tool_q_dev_user_data.

* feat(q-dev): add Kiro AI Model ROI dashboard (#8795)

Add a Grafana dashboard analyzing per-model performance from chat logs:
- Model Performance Summary table (requests, share%, avg prompt/response
  length, response/prompt ratio, steering/spec mode usage)
- Daily Model Usage Distribution (stacked bar chart)
- Avg Response Length by Model trend (output quality proxy)

Data source: _tool_q_dev_chat_log grouped by model_id.

* feat(q-dev): add Steering & Spec Mode Adoption dashboard (#8798)

Track Kiro steering rules and spec mode adoption:
- User/request adoption rate stats
- Weekly adoption rate trend
- Steering impact on prompt/response length
- Per-user feature adoption table

* feat(q-dev): add Developer AI Productivity Hours dashboard (#8797)

Analyze when developers are most productive with AI tools:
- AI Activity by Hour of Day (chat + completions stacked bar)
- Prompt & Response Length by Hour (complexity patterns)
- Feature Usage by Hour (steering/spec mode/plain chat)
- AI Activity by Day of Week

* feat(q-dev): add Language AI Heatmap dashboard (#8796)

Analyze AI-assisted coding patterns by programming language:
- Language Completion Profile table (requests, avg completions,
  context sizes, users per language)
- Daily Completions by Language (stacked bar)
- Active File Types During Chat (donut)
- Avg Context Size by Language trend (top 5)

* Fix/circleci column names (#8799)

* fix(circleci): rename created_at to created_date in jobs/workflows
Add migration to copy created_at -> created_date and update models/converters.

* fix(circleci): update pipeline parsing

* test(circleci): add incremental tests for collectors

* fix(jenkins): scope multi-branch build collection to current project (#8430) (#8781)

The branch jobs query in collectMultiBranchJobApiBuilds selected all
WorkflowJob entries across all multi-branch pipelines for a connection,
causing builds to be duplicated and misattributed. Filter by
_raw_data_params to collect only the current project's branch jobs.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: Make gh-copilot plugin database agnostic (#8779)

Co-authored-by: Eldrick Wega <eldrick.wega@outlook.com>

* fix(sonarqube): increase cq_issues and cq_file_metrics project_key length to 500 (#8783)

Fixes #8331

* feat: added taiga plugin (#8755)

* feat: added taiga plugin

* fix: fixed tests

* feat(gh-copilot): add support for organization daily user metrics (#8747)

* feat(circleci): add server version requirement and endpoint help text (#8757)

Update CircleCI connection form to indicate Server v4.x+ requirement
and provide guidance for server endpoint configuration.

Signed-off-by: Joshua Smith <jbsmith7741@gmail.com>

* fix: fixed test files

---------

Signed-off-by: Joshua Smith <jbsmith7741@gmail.com>
Co-authored-by: Reece Ward <47779818+ReeceXW@users.noreply.github.com>
Co-authored-by: Joshua Smith <jbsmith7741@gmail.com>

* fix(docker): pin Poetry to 2.2.1 for Python 3.9 compatibility (#8735)

Poetry 2.3.0 dropped Python 3.9 support. Without cache the installer
fetches the latest version (currently 2.3.2), which fails on the
python:3.9-slim-bookworm base image. Pin to 2.2.1, the last release
compatible with Python 3.9.

Co-authored-by: Rodrigo Silva <rodrigo.silva@bonial.com>

* fix(linker): scope clearHistoryData to current project only (#8814) (#8815)

The clearHistoryData() function used a LEFT JOIN with project_name
in the ON clause, causing the subquery to return all PR IDs regardless
of project. This effectively wiped the entire pull_request_issues table
on every linker run, deleting links from other projects sharing the
same repos and links created by the GitHub converter.

Fix:
- Use INNER JOIN + WHERE for proper project scoping
- Add issue-side subquery scoped to current project's boards
- Filter by _raw_data_table/_raw_data_remark to only delete
  linker-created rows

Add e2e test for cross-project shared repo scenario.

* fix(circleci): prevent negative values when calculating circleci (#8800)

workflow duration

* fix: sonarqube: missing api/users/search endpoint (#8813)

* fix(argocd): extract revision from multi-source application revisions[] (#8810)

---------

Signed-off-by: Joshua Smith <jbsmith7741@gmail.com>
Signed-off-by: Yaroslav Halchenko <debian@onerussian.com>
Co-authored-by: tamas.albert <tamas.laczkoalbert@concentrix.com>
Co-authored-by: Warren Chen <warren.chen830@gmail.com>
Co-authored-by: Ema Abitante <ema.abitante@gmail.com>
Co-authored-by: Spiff Azeta <35563797+spiffaz@users.noreply.github.com>
Co-authored-by: Spiff Azeta <spiffazeta@gmail.com>
Co-authored-by: Eldrick Wega <eldrick.wega@outlook.com>
Co-authored-by: Dan Crews <crewsd@gmail.com>
Co-authored-by: Tomoya Kawaguchi <68677002+yamoyamoto@users.noreply.github.com>
Co-authored-by: Joshua Smith <jbsmith7741@gmail.com>
Co-authored-by: jawad khan <jawadkhan444@gmail.com>
Co-authored-by: Leif Roger Frøysaa <leif.roger.froysaa@akerbp.com>
Co-authored-by: Klesh Wong <klesh@qq.com>
Co-authored-by: NaRro <cong.wang@merico.dev>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Reece Ward <47779818+ReeceXW@users.noreply.github.com>
Co-authored-by: Yaroslav Halchenko <debian@onerussian.com>
Co-authored-by: Chris Pavlicek <varsis@users.noreply.github.com>
Co-authored-by: AvivGuiser <avivguiser@gmail.com>
Co-authored-by: Shayne Clausson <shayne.clausson@extendaretail.com>
Co-authored-by: irfanuddinahmad <34648393+irfanuddinahmad@users.noreply.github.com>
Co-authored-by: Rodrigo Silva <rodrigoluizscs@gmail.com>
Co-authored-by: Rodrigo Silva <rodrigo.silva@bonial.com>
Co-authored-by: Daniele M. <github@dmoraschi.com>
Co-authored-by: Pavel Sturc <psturc@redhat.com>
Co-authored-by: Anvesh Vemula <39478419+vemulaanvesh@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-type/bug-fix This PR fixes a bug size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants